Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelized cluster hosts discovering + async config cache update #4077

Merged
merged 8 commits into from
Nov 28, 2024

Conversation

karol-kokoszka
Copy link
Collaborator

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@karol-kokoszka
Copy link
Collaborator Author

@Michal-Leszczynski Please take a look on the PR.

@Michal-Leszczynski
Copy link
Collaborator

@karol-kokoszka It looks like tests are failing.

@karol-kokoszka
Copy link
Collaborator Author

Yes, some of them are failing and I have no clue yet why.

@karol-kokoszka
Copy link
Collaborator Author

And I'm not able to reproduce it locally, using the same setup as was used on failing test.
It basically PASSes locally.
I'm starting to think that it's something related to the env on gh actions.

@karol-kokoszka
Copy link
Collaborator Author

Looks that ensuring that all nodes are UP on the test that was failing previously helped.
We are starting/stopping services using supervisorctl as we refer to the docker environment. Service start doesn't mean that the node is UP.
I added simple utility function that ensures that nodes are UP.

@Michal-Leszczynski please review.

pkg/service/cluster/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/cluster/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/cluster/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/cluster/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/cluster/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/cluster/service.go Show resolved Hide resolved
pkg/service/healthcheck/service_integration_test.go Outdated Show resolved Hide resolved
pkg/service/healthcheck/service_integration_test.go Outdated Show resolved Hide resolved
pkg/testutils/exec.go Outdated Show resolved Hide resolved
pkg/service/healthcheck/service_integration_test.go Outdated Show resolved Hide resolved
@Michal-Leszczynski
Copy link
Collaborator

@karol-kokoszka I resolved some already fixed comments.
Comments regarding checking node state in tests (com1, com2, com3) are tied to this conversation.
When all comments are resolved, I will re-review this PR and we will hopefully be able to merge it!

…date async

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@karol-kokoszka karol-kokoszka merged commit da83642 into master Nov 28, 2024
51 checks passed
@karol-kokoszka karol-kokoszka deleted the kk/4074 branch November 28, 2024 12:21
Michal-Leszczynski pushed a commit that referenced this pull request Dec 10, 2024
…4077)

* fix(cluster): parallelize discovering cluster hosts and make cache update async

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.

(cherry picked from commit da83642)
Michal-Leszczynski pushed a commit that referenced this pull request Dec 10, 2024
…4077)

* fix(cluster): parallelize discovering cluster hosts and make cache update async

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.

(cherry picked from commit da83642)
Michal-Leszczynski pushed a commit that referenced this pull request Dec 11, 2024
…4077)

* fix(cluster): parallelize discovering cluster hosts and make cache update async

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.

(cherry picked from commit da83642)
Michal-Leszczynski pushed a commit that referenced this pull request Dec 11, 2024
…4077)

* fix(cluster): parallelize discovering cluster hosts and make cache update async

Fixes #4074

This PR makes the cluster hosts discovery more robust, as when the cluster.host is DOWN,
it probes all other hosts in parallel and returns response from the fastest one.
Additionally, this PR makes the call to cluster config cache async on updating cluster.

(cherry picked from commit da83642)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster update API call may timeout when one of the known hosts is down
2 participants